Compression Picks The Significant Item Sets

نویسندگان

Matthijs van Leeuwen

Jilles Vreeken

Arno Siebes

چکیده

Finding a comprehensive set of patterns that truly captures the characteristics of a database is a complicated matter. Frequent item set mining attempts this, but low support levels often result in exorbitant amounts of item sets. Recently we showed that by using MDL we are able to select a small number of item sets that compress the data well [15]. Here we show that this small set is a good approximation of the underlying data distribution. Using the small set in a MDL-based classifier leads to performance on par with wellknown rule-induction and association-rule based methods. Advantages are that no parameters need to be set manually and only very few item sets are used. The classification scores indicate that selecting item sets through compression is an elegant way of mining interesting patterns that can subsequently find use in many applications.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce

A Parallel Frequent Item sets mining algorithm called FiDoop using MapReduce programming model. FiDoop includes the frequent items ultrametric tree(FIU-tree), in that three MapReduce jobs are applied to complete the mining task. The scalability problem has been addressed bythe implementation of a handful of FP-growth-like parallelFIM algorithms. InFiDoop, the mappers independently and concurren...

متن کامل

Compression Cluster Based Efficient k-Medoid Algorithm to Increase Scalability

The experiments are pursued on both synthetic in data sets are real. The synthetic data sets which we used for our experiments were generated using the procedure. We refer to readers to it for more details to the generation of large data sets. We report experimental results on two synthetic more data sets in this data set; the average transaction of size and its average maximal potentially freq...

متن کامل

Tree Based Space Partition of Trajectory Pattern Mining For Frequent Item Sets

Transaction Data base (TD) is an extension of frequent item set mining in large static of data mining field. The dynamic and continuous evolving nature of data base requires up hMinor algorithm, hCount and lossy coun explosion of patterns. Fixed window length and decay factor are required to implement the explosion model. The scanning and the support evaluation for item set are fast. Hence, the...

متن کامل

Automatic S-Wave Picker for Local Earthquake Tomography

High-resolution seismic tomography at local and regional scales requires large and consistent sets of arrival-time data. Algorithms combining accurate picking with an automated quality classification can be used for repicking waveforms and compiling large arrival-time data sets suitable for tomographic inversion. S-wave velocities represent a key parameter for petrological interpretation, impro...

متن کامل

An Efficient Frequent Pattern Mining Algorithm to Find the Existence of K-Selective Interesting Patterns in Large Dataset Using SIFPMM

Association rule mining in huge database is one of most popular data exploration technique for business decision makers. Discovering frequent item set is the fundamental process in association rule mining. Several algorithms were introduced in the literature to find frequent patterns. Those algorithms discover all combinations of frequent item sets for a given minimum support threshold. But som...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Compression Picks The Significant Item Sets

نویسندگان

چکیده

منابع مشابه

Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce

Compression Cluster Based Efficient k-Medoid Algorithm to Increase Scalability

Tree Based Space Partition of Trajectory Pattern Mining For Frequent Item Sets

Automatic S-Wave Picker for Local Earthquake Tomography

An Efficient Frequent Pattern Mining Algorithm to Find the Existence of K-Selective Interesting Patterns in Large Dataset Using SIFPMM

عنوان ژورنال:

اشتراک گذاری